Methods for scalable data assimilation

Colin Grudzien1,2, Marc Bocquet3, Sukhreen Sandhu2 et al.

University of Nevada, Reno.
Center for Western Weather and Water Extremes.
CEREA.

  1. Center for Western Weather and Water Extremes (CW3E), Scripps Institution of Oceanography, University of California San Diego, San Diego, CA, USA
  2. Department of Mathematics and Statistics, University of Nevada, Reno, Reno, NV, USA
  3. CEREA, A joint laboratory of École des Ponts Paris Tech and EDF R&D, Université Paris-Est, Champs-sur-Marne, France.

Research themes

  • Data assimilation (DA) refers to methodology for combining:

    • data from physics-based, numerical models; and
    • real-world observations
  • to produce an improved estimate for the state of a time-evolving, random process and the parameters that govern its evolution.

  • My research experience is in developing scalable data assimilation methodology.

  • I utilize statistical, mathematical and numerical tools for understanding:

    • the stability and convergence of these estimators;
    • the robustness and reliability of their inferences in the presence of modeling errors and bias;
    • and the numerical efficiency of these schemes
  • for problems inspired by high-dimensional, chaotic dynamical systems characteristic of weather and climate.

Motivation

  • A Bayesian approach to DA is widely adopted because it provides a unified treatment of the tools from statistical estimation, nonlinear optimization and machine learning.
  • Suppose that the dynamical physical states can be written in a vector, \( \pmb{x}_k \in \mathbb{R}^{N_x} \), where \( k \) corresponds to some time \( t_k \).
  • Abstractly, the time-evolution of these states is represented by the nonlinear map \( \mathcal{M} \), \[ \begin{align}\\ \pmb{x}_k = \mathcal{M}_{k} \left(\pmb{x}_{k-1}, \boldsymbol{\lambda}\right) + \boldsymbol{\eta}_k \end{align} \] where
    • \( \pmb{x}_{k-1} \) is the vector of physical states at an earlier time \( t_{k-1} \);
    • \( \boldsymbol{\lambda} \) is a vector of uncertain physical parameters on which the time evolution depends;
    • \( \boldsymbol{\eta}_k \) is an additive, stochastic noise term, representing errors in our model for the physical process.
  • We wish to estimate the random vector \( {\color{#d95f02} {\pmb{x}_k } } \) with a prior distribution on \( \left(\pmb{x}_{k-1}, \boldsymbol{\lambda}\right) \) and knowledge of \( \mathcal{M}_{k} \) and knowledge of how \( \boldsymbol{\eta}_k \) is distributed.
  • For the rest of this discussion, we restrict to the case that \( \boldsymbol{\lambda} \) is a known constant, and the forecast model is perfect, \[ \begin{align} \pmb{x}_k = \mathcal{M}_{k} \left(\pmb{x}_{k-1}\right), \end{align} \] for simplicity.
  • However, a strength of the Bayesian approach is that it is easily extended to handle the more general case.

Motivation

  • At time \( t_{k-1} \), we make a forecast for the distribution of \( \pmb{x}_k \) with our prior knowledge, including the physics-based model.
  • After some time, we are given real-world observations \( {\color{#7570b3} {\pmb{y}_k\in\mathbb{R}^{N_y}} } \) related to the physical states by, \[ {\color{#7570b3} { \pmb{y}_k = \mathcal{H}_k \left(\pmb{x}_k\right) + \boldsymbol{\epsilon}_k } } \] where
    • \( {\color{#7570b3} {\mathcal{H}_k:\mathbb{R}^{N_x} \rightarrow \mathbb{R}^{N_y} } } \) is a nonlinear map relating the states we wish to estimate \( \pmb{x}_k \) to the values that are observed \( {\color{#7570b3} { \pmb{y}_k} } \).
      • Typically \( N_y \ll N_x \) so this is information is sparse and observations are not \( 1:1 \) with the unobserved states.
    • \( \boldsymbol{\epsilon}_k \) is an additive, stochastic noise term representing errors in the measurements.
  • At time \( t_k \) we have a forecast distribution for the states \( \pmb{x}_k \) generated by our prior on \( \pmb{x}_{k-1} \) and our state model \( \mathcal{M} \).
  • We also have an observation \( \pmb{y}_k \) with uncertainty.
  • The basic goal of Bayesian DA to find a posterior distribution for \( \pmb{x}_k \) conditioned on \( \pmb{y}_k \), or some statistic of this distribution.
  • For operational forecasting, this furthermore should be performed sequentially and recursively in time.

Observation-analysis-forecast cycle

Diagram of the filter observation-analysis-forecast cycle.

  • Recursive estimation of the distribution for \( \pmb{x}_k \) conditional on \( \pmb{y}_k \) can be described as an:
    1. observation
    2. analysis
    3. forecast
  • cycle.
  • Assume an initial forecast-prior for the physics-based numerical model state.
  • Using the forecast-prior and the likelihood of the observation,
    • we estimate the Bayesian update of the prior to the posterior
    • conditioned on the observation.

Smoothing

Diagram of the filter observation-analysis-forecast cycle.

  • Recursive estimates of the current state can be performed in this fashion, but a related question regards the past states.
  • New observations from the future gives information about the model states at past times.
  • This produces a retrospective posterior estimate for the past states.
  • Recursive estimation of the present state is known as filtering.
  • Conditional estimation of a past state given future observations is known as smoothing.
  • Note that the filtering density for the current time is actually just a marginal of this joint posterior over all states in the DAW \[ \begin{align} p(\pmb{x}_3 \vert \pmb{y}_3, \pmb{y}_2, \pmb{y}_1) = \int \int p(\pmb{x}_3, \pmb{x}_2, \pmb{x}_1 \vert \pmb{y}_3, \pmb{y}_2, \pmb{y}_1)\mathrm{d}\pmb{x}_2 \mathrm{d}\pmb{x}_1 \end{align} \]
  • A smoothing estimate may be produced in a variety of ways, exploiting different formulations of the Bayesian problem.
  • We may estimate only a marginal as on the left-hand-side, or the entire joint posterior as in the integrand above.1
  • Depending on how the Bayesian analysis is performed, one can design different estimators to exploit the operational problem.
1. Cosme, E., et al. (2012). Smoothing problems in a Bayesian framework and their linear Gaussian solutions. Monthly Weather Review, 140(2), 683-695.

Hidden Markov models

  • Recall the physical state model and forward observation model, \[ \begin{align} \pmb{x}_k &= \mathcal{M}_{k} \left(\pmb{x}_{k-1}\right)\\ \pmb{y}_k &= \mathcal{H}_k \left(\pmb{x}_k\right) + \boldsymbol{\epsilon}_k \end{align} \]
  • Denote state model and observation model time series for \( k < l \) as, \[ \begin{align} \pmb{x}_{l:k} := \left\{\pmb{x}_l, \pmb{x}_{l-1}, \cdots, \pmb{x}_k\right\}, & & \pmb{y}_{l:k} := \left\{\pmb{y}_l, \pmb{y}_{l-1}, \cdots, \pmb{y}_k\right\}. \end{align} \]
  • Assume that the sequence of observation error \[ \begin{align} \{\boldsymbol{\epsilon}_k : k=1,\cdots, L, \cdots\} \end{align} \] is independent in time.
  • The above formulation is a type of hidden Markov model;
    • the dynamic state variables \( \pmb{x}_k \) are known as the hidden variables, because they are not directly observed.
  • These assumptions determine the Markov probability densities for the hidden variables, i.e., \[ \begin{align} p\left(\pmb{x}_{L:0}\right) &= p\left(\pmb{x}_{L} \vert \pmb{x}_{L-1:0}\right) p\left(\pmb{x}_{L-1:0}\right)\\ &= p\left(\pmb{x}_{L} \vert \pmb{x}_{L-1}\right) p\left(\pmb{x}_{L-1:0} \right). \end{align} \]
  • Applying the Markov property recursively, \[ p\left(\pmb{x}_{L:0}\right) = p(\pmb{x}_0) \prod_{k=1}^{L} p(\pmb{x}_k|\pmb{x}_{k-1}). \]
  • Similarly, \[ p\left(\pmb{y}_{k}\vert \pmb{x}_k, \pmb{y}_{k-1:1}\right) = p\left(\pmb{y}_k \vert \pmb{x}_k\right)=p_{\pmb{\epsilon}_k}\left(\pmb{y}_k - \mathcal{H}_k(\pmb{x})\right) \] by the independence assumption on the observation errors.

Bayesian inference

  • Using these assumptions and Bayes' law, the filtering cycle is written as \[ \begin{align} {\color{#d95f02} {p\left(\pmb{x}_{L} \vert \pmb{y}_{L:1}\right)} } &=\frac{ {\color{#7570b3} { p\left(\pmb{y}_L \vert \pmb{x}_L\right) } } {\color{#1b9e77} { p\left(\pmb{x}_L\vert \pmb{y}_{L-1:1}\right) } } }{p\left(\pmb{y}_L\vert \pmb{y}_{L-1:1}\right)}. \end{align} \]
  • where:
    • \( {\color{#d95f02} {p\left(\pmb{x}_{L} \vert \pmb{y}_{L:1}\right)}} \) – this is the posterior estimate for the hidden states (at the current time) given all observations \( \pmb{y}_{L:1} \);
    • \( {\color{#7570b3} {p\left(\pmb{y}_{L}\vert \pmb{x}_{L}\right)}} \) – this is the likelihood of the observed data given our model forecast;
    • \( {\color{#1b9e77} {p\left(\pmb{x}_L\vert \pmb{y}_{L-1:1}\right)}} \) – this is the model forecast-prior from our last best estimate of the state.
      • Suppose that the posterior probability density \( {\color{#d95f02} {p(\pmb{x}_{L-1} \vert \pmb{y}_{L-1:1})}} \) at the last observation time \( t_{L-1} \) is computed;
      • then the model forecast probability density is given by, \[ \begin{align} {\color{#1b9e77} {p(\pmb{x}_L\vert \pmb{y}_{L-1:1})} } &= \int p(\pmb{x}_L \vert \pmb{x}_{L-1}) {\color{#d95f02} {p(\pmb{x}_{L-1} \vert \pmb{y}_{L-1:1})} }\mathrm{d}\pmb{x}_{L-1} \end{align} \]
    • \( p\left(\pmb{y}_L \vert \pmb{y}_{L-1:1}\right) \) is the marginal of joint density \( p( \pmb{y}_L, \pmb{x}_L \vert \pmb{y}_{L-1:1}) \) integrating out the hidden variable, \[ p\left(\pmb{y}_L \vert \pmb{y}_{L-1:1}\right) = \int p(\pmb{y}_L \vert \pmb{x}_L) p(\pmb{x}_L \vert \pmb{y}_{L-1:1})\mathrm{d}\pmb{x}_{L}. \]

Bayesian MAP estimates

  • Typically, the probability density in the denominator \[ p\left(\pmb{y}_L \vert \pmb{y}_{L-1:1}\right) = \int p(\pmb{y}_L \vert \pmb{x}_L) p(\pmb{x}_L \vert \pmb{y}_{L-1:1})\mathrm{d}\pmb{x}_{L} \] is mathematically intractable.
    • However, the denominator is independent of the hidden variable \( \pmb{x}_{L} \);
      • its purpose is only to normalize the integral of the posterior density to \( 1 \) over all \( \pmb{x}_{L} \).

  • Instead, as a proportionality statement, \[ \begin{align} {\color{#d95f02} {p\left(\pmb{x}_{L} \vert \pmb{y}_{L:1}\right)} } &\propto {\color{#7570b3} { p\left(\pmb{y}_L \vert \pmb{x}_L\right) } } {\color{#1b9e77} { p\left(\pmb{x}_L\vert \pmb{y}_{L-1:1}\right) } } , \end{align} \]
  • we can devise the Bayesian maximum a posteriori (MAP) estimate as the choice of \( \overline{\pmb{x}}_L \) that maximizes the posterior density,
    • but written in terms of the two right-hand-side components.
  • Maximizing the posterior density, the denominator leads to insignificant constants that we can neglect.
    • Computing the MAP sequentially and recursively in time, we only need a recursion in proportionality as above.

Bayesian MAP estimates

  • Under the “linear-Gaussian” assumption, the optimal filtering problem is equivalent to the Bayesian MAP cost function \[ \begin{align} {\color{#d95f02} {\mathcal{J}(\pmb{x}_{L})} } &= {\color{#1b9e77} {\frac{1}{2}\parallel \overline{\pmb{x}}_{L}^\mathrm{fore} -\pmb{x}_{L}\parallel_{\mathbf{B}_{L}^\mathrm{fore}}^2} } + {\color{#7570b3} {\frac{1}{2}\parallel\pmb{y}_L - \mathbf{H}_L \pmb{x}_L\parallel_{\mathbf{R}_L}^2} }, \end{align} \] where the above weighted norms can be understood as:
    • \( \parallel \circ \parallel_{\mathbf{B}_k^\mathrm{fore}} \) weighting relative to the forecast covariance (spread); and
    • \( \parallel \circ \parallel_{\mathbf{R}_k} \) weighting relative to the observation error covariance (imprecision).

  • The MAP state interpolates the forecast mean and the observational data relative to the uncertainty in each piece of data.
  • To render the cost function into right-transform analysis, write the matrix factor \[ \begin{align} \mathbf{B}_{L}^\mathrm{fore} : = \boldsymbol{\Sigma}_{L}^\mathrm{fore} \left(\boldsymbol{\Sigma}_{L}^\mathrm{fore}\right)^\top. \end{align} \]
  • The analysis can be written in terms of optimizing weights \( \pmb{w} \) where \[ \begin{align} \pmb{x}_L := \overline{\pmb{x}}_L^\mathrm{fore} + \boldsymbol{\Sigma}_L^\mathrm{fore}\pmb{w}; \end{align} \]
  • the equation written in terms of the weights is given as \[ \begin{align} {\color{#d95f02} {\mathcal{J}(\pmb{w}) } } = {\color{#1b9e77} {\frac{1}{2} \parallel \pmb{w}\parallel^2} } + {\color{#7570b3} {\frac{1}{2} \parallel \pmb{y}_L - \mathbf{H}_L \overline{\pmb{x}}_L^\mathrm{fore} - \mathbf{H}_L \boldsymbol{\Sigma}_L^\mathrm{fore} \pmb{w} \parallel_{\mathbf{R}_L}^2 } }. \end{align} \]

Bayesian MAP estimates

  • Make the following definitions: \[ \begin{align} \overline{\pmb{y}}_L = \mathbf{H}_L \overline{\pmb{x}}_L^\mathrm{fore}, & & \overline{\pmb{\delta}}_L &= \mathbf{R}^{-\frac{1}{2}}_L \left(\pmb{y}_L - \overline{\pmb{y}}_L\right), & & \boldsymbol{\Gamma}_L =\mathbf{R}_L^{-\frac{1}{2}}\mathbf{H}_L \boldsymbol{\Sigma}_L^\mathrm{fore}. \end{align} \]
    • The vector \( \overline{\pmb{y}}_L \) is the forecast mean mapped to observation space.
    • The vector \( \overline{\pmb{\delta}} \) is the innovation vector, weighted by the observation uncertainty.
    • \( \boldsymbol{\Gamma}_L \) in one dimension would equal the standard deviation of the model forecast relative to the standard deviation of the observation error.
  • Then, the MAP cost function is further reduced to \[ \begin{align} {\color{#d95f02} {\mathcal{J}(\pmb{w}) } } = {\color{#1b9e77} {\frac{1}{2} \parallel \pmb{w}\parallel^2}} + {\color{#7570b3} {\frac{1}{2} \parallel \overline{\pmb{\delta}}_L - \boldsymbol{\Gamma}_L \pmb{w} \parallel^2 } } \end{align} \]
  • This cost function is quadratic in \( \pmb{w} \) and can be globally minimized where \( \nabla_{\pmb{w}} \mathcal{J} = \pmb{0} \).
  • Solving this is similar to solving the normal equations of least-squares regression;
    • indeed, the Kalman filter is also the best linear unbiased estimator (BLUE) for linear-Gaussian hidden Markov models.2
  • Note, the MAP weights \( \overline{\pmb{w}} \) don’t themselves provide an update to the forecast covariance \( \mathbf{B}_L^\mathrm{fore} \).
  • A sub-optimal approach is to assume that the background covariance \( \mathbf{B}_L^\mathrm{fore} \equiv \mathbf{B} \equiv \mathbf{B}_L^\mathrm{filt} \) is completely invariant-in-time.
    • This 3D-VAR approach is computationally cost-effective, but lacks the information of the time-dependent forecast spread.
  • In the simplified linear-Gaussian model, we can find a recursive form for the filter covariance;
    • the forecast model propagates the filter covariance, so that one can recursively solve this equation in time.
2. Asch, M., Bocquet, M., & Nodet, M. (2016). Data assimilation: methods, algorithms, and applications. Society for Industrial and Applied Mathematics.

Bayesian MAP estimates

  • Setting the gradient \( \nabla_{\pmb{w}} \mathcal{J} \) equal to zero for \( \overline{\pmb{w}} \) we find the critical value as

    \[ \begin{align} \overline{\pmb{w}} = \pmb{0} - {\boldsymbol{\Xi}_\mathcal{J}}^{-1} \nabla_{\pmb{w}} \mathcal{J}|_{\pmb{w}=\pmb{0}}. \end{align} \] where \( \boldsymbol{\Xi}_\mathcal{J}:= \nabla^2_{\pmb{w}}\mathcal{J} \) is the Hessian of the cost function.

    • This corresponds to a single iteration of Newton's descent algorithm.
  • The forecast mean is updated as,

    \[ \begin{align} \overline{\pmb{x}}_L^\mathrm{filt}:= \overline{\pmb{x}}_{L}^\mathrm{fore} + \boldsymbol{\Sigma}_{L}^\mathrm{fore} \overline{\pmb{w}}. \end{align} \]

  • Defining a right-transform matrix, \( \mathbf{T}:= \boldsymbol{\Xi}^{-\frac{1}{2}}_{\mathcal{J}} \), the update for the covariance is given as

    \[ \begin{align} \mathbf{B}^\mathrm{filt}_L = \left(\boldsymbol{\Sigma}_L^\mathrm{fore} \mathbf{T} \right)\left(\boldsymbol{\Sigma}_L^\mathrm{fore} \mathbf{T} \right)^\top. \end{align} \]

  • This derivation sketches the square root Kalman filter,3 written for the perfect, linear-Gaussian model.

  • Under the perfect model assumption, the forecast is furthermore written

    \[ \begin{align} \overline{\pmb{x}}_{L+1}^\mathrm{fore}:= \mathbf{M}_{L+1} \overline{\pmb{x}}_{L}^{\mathrm{filt}} & & \boldsymbol{\Sigma}_{L+1}^\mathrm{fore} := \mathbf{M}_{L+1}\left(\boldsymbol{\Sigma}_{L}^\mathrm{filt}\right) \end{align} \] giving a complete recursion in time, within the matrix factor.

3. Tippett, M. K., Anderson, J. L., Bishop, C. H., Hamill, T. M., & Whitaker, J. S. (2003). Ensemble square root filters. Monthly Weather Review, 131(7), 1485-1490.

The ETKF

  • The square root Kalman filter analysis is written entirely in terms of the weight vector \( \overline{\pmb{w}} \) and the right-transform matrix \( \mathbf{T} \).

    • The cost of computing \( \boldsymbol{\Xi}_{\mathcal{J}}^{-1} \) and \( \mathbf{T}:=\boldsymbol{\Xi}_{\mathcal{J}}^{-\frac{1}{2}} \) are both subordinate to the cost of the singular value decomposition of \( \boldsymbol{\Xi}_{\mathcal{J}} \).
  • This is core to the efficiency of the ensemble transform Kalman filter (ETKF),4 using a reduced-rank approximation to the background \( \mathbf{B}^\mathrm{i}_L \).

  • Given an ensemble matrix \( \mathbf{E}^\mathrm{i}_L \in \mathbb{R}^{N_x \times N_e} \)

    • the columns are assumed an independent identically distributed (iid) sample of some distribution corresponding to the label \( \mathrm{i} \).
    • In weather and climate models \( N_e \ll N_x \), and \( N_x \) can be on order \( \mathcal{O}\left(10^9\right) \);5
    • \( N_e \) is typically on order \( \mathcal{O}\left(10^2\right) \).
  • Writing the cost function restricted to the ensemble span, we have a restricted Hessian \( \boldsymbol{\Xi}_{\widetilde{\mathcal{J}}} \) of the form

    \[ \begin{align} \boldsymbol{\Xi}_\mathcal{\widetilde{J}} \in \mathbb{R}^{N_e \times N_e}. \end{align} \]

4. Sakov, P., & Oke, P. R. (2008). Implications of the form of the ensemble transformation in the ensemble square root filters. Monthly Weather Review, 136(3), 1042-1053.
5. Carrassi, A., Bocquet, M., Bertino, L., & Evensen, G. (2018). Data assimilation in the geosciences: An overview of methods, issues, and perspectives. Wiley Interdisciplinary Reviews: Climate Change, 9(5), e535.

The ETKF

  • Using ensemble-based, empirical estimates,

    \[ \begin{align} & & \hat{\pmb{x}}_L^\mathrm{fore} &= \mathbf{E}_L^\mathrm{fore} \pmb{1} / N_e ; & \hat{\pmb{\delta}}_L &= \mathbf{R}_k^{-\frac{1}{2}}\left(\pmb{y}_L - \mathbf{H}_L \hat{\pmb{x}}_L\right)\\ &&\mathbf{X}_L^\mathrm{fore} &= \mathbf{E}_L^\mathrm{fore} - \hat{\pmb{x}}^\mathrm{fore}_L \pmb{1}^\top ; & \mathbf{P}^\mathrm{fore}_L &= \mathbf{X}_L^\mathrm{fore} \left(\mathbf{X}_L^\mathrm{fore}\right)^\top / \left(N_e - 1\right);\\ & &\mathbf{S}_L &:=\mathbf{R}_L^{-\frac{1}{2}}\mathbf{H}_L \mathbf{X}_L^\mathrm{fore}; \end{align} \]

  • the ensemble-based cost function is written as

    \[ \begin{alignat}{2} & & {\color{#d95f02} {\widetilde{\mathcal{J}} (\pmb{w})} } &= {\color{#1b9e77} {\frac{1}{2} \parallel \hat{\pmb{x}}_L^\mathrm{fore} - \mathbf{X}^\mathrm{fore}_L \pmb{w}- \hat{\pmb{x}}^\mathrm{fore}_L \parallel_{\mathbf{P}^\mathrm{fore}_L}^2} } + {\color{#7570b3} {\frac{1}{2} \parallel \pmb{y}_L - \mathbf{H}_L\hat{\pmb{x}}^\mathrm{fore}_L - \mathbf{H}_L \mathbf{X}^\mathrm{fore}_L \pmb{w} \parallel_{\mathbf{R}_L}^2 } }\\ \Leftrightarrow& & {\color{#d95f02} {\widetilde{\mathcal{J}}(\pmb{w})} } &= {\color{#1b9e77} {\frac{1}{2}(N_e - 1) \parallel\pmb{w}\parallel^2} } + {\color{#7570b3} {\frac{1}{2}\parallel \hat{\pmb{\delta}}_L - \mathbf{S}_L \pmb{w}\parallel^2 } } \end{alignat} \] which is an optimization over a weight vector \( \pmb{w} \) in the ensemble dimension \( N_e \) rather than in the state space dimension \( N_x \).

  • This a key reduction that makes Monte Carlo methods feasible for the large size of geophysical models.

    • Still, other techniques such as covariance localization6 and hybridization7 are used in practice to overcome the curse of dimensionality.
6. Sakov, P., & Bertino, L. (2011). Relation between two common localisation methods for the EnKF. Computational Geosciences, 15(2), 225-237.
7. Penny, S. G. (2017). Mathematical foundations of hybrid data assimilation from a synchronization perspective. Chaos: An Interdisciplinary Journal of Nonlinear Science, 27(12), 126801.

The ETKF

  • This is a sketch of the derivation of the local ensemble transform Kalman filter (LETKF).8
  • In this formalism, an ensemble right-transform \( \boldsymbol{\Psi}_k \) exists such that for any \( t_k \), \[ \begin{align} \mathbf{E}^\mathrm{filt}_k = \mathbf{E}^\mathrm{fore}_k \boldsymbol{\Psi}_k \end{align} \] where in the above we would say that (approximately) \[ \begin{align} \mathbf{E}^\mathrm{filt}_k &\sim p(\pmb{x}_k \vert \pmb{y}_{k:1}) \\ \mathbf{E}^\mathrm{fore}_k &\sim p(\pmb{x}_k \vert \pmb{y}_{k-1:1}) \end{align} \]
  • We will associate \( \mathbf{E}^\mathrm{filt}_L \equiv \mathbf{E}^\mathrm{smth}_{L|L} \);
    • under the linear-Gaussian model, we furthermore have that \[ \begin{align} \mathbf{E}^\mathrm{smth}_{k |L} = \mathbf{E}^\mathrm{smth}_{k|L-1}\boldsymbol{\Psi}_{L} & & \mathbf{E}^\mathrm{smth}_{k|L} \sim p(\pmb{x}_k \vert \pmb{y}_{L:1}). \end{align} \]
  • A retrospective smoothing analysis can be performed on all past states stored in memory using the latest right-transform update from the filtering step.
  • This form of retrospective analysis is the basis of the ensemble Kalman smoother (EnKS).9
8. Hunt, B. R., Kostelich, E. J., & Szunyogh, I. (2007). Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D: Nonlinear Phenomena, 230(1-2), 112-126.
9. Evensen, G., & Van Leeuwen, P. J. (2000). An ensemble Kalman smoother for nonlinear dynamics. Monthly Weather Review, 128(6), 1852-1867.

The EnKS

  • The EnKS takes advantage of the simple form of the retrospective, right-transform analysis by including an additional, inner loop of the filtering cycle.
Diagram of the filter observation-analysis-forecast cycle.
  • Time is the horizontal axis where right moves forward in time.
  • At each time, we produce the standard filtering estimate by computing \( \boldsymbol{\Psi}_L \) from the cost function, and updating the forecast \[ \mathbf{E}^\mathrm{filt}_L = \mathbf{E}_L^\mathrm{fore} \boldsymbol{\Psi}_L. \]
  • The information of incoming observations is passed backward in time using the right-transform to condition the ensemble at past times: \[ \begin{align} \mathbf{E}^\mathrm{smth}_{k|L} = \mathbf{E}^\mathrm{smth}_{k|L-1} \boldsymbol{\Psi}_L. \end{align} \]

The 4D cost function

  • However, re-initializing the DA cycle with the smoothed conditional ensemble estimate \( \mathbf{E}^\mathrm{smth}_{0|L} \) can dramatically improve the performance of the subsequent forecast and filtering statistics.
    • Let us denote the composition of the model forecast as, \[ \begin{align} \\ \mathcal{M}_{l:k} : = \mathcal{M}_l \circ \cdots \circ \mathcal{M}_{k} & & \mathbf{M}_{l:k} := \mathbf{M}_{l}\cdots \mathbf{M}_{k} \end{align} \]
  • This exploits the miss-match in nonlinear dynamics between \[ \begin{align} \\ \mathcal{M}_{L:1}\left(\mathbf{E}_{0|L}^\mathrm{smth}\right):= \mathcal{M}_L \circ \cdots \circ \mathcal{M}_1\left(\mathbf{E}_{0|L}^\mathrm{smth}\right) \neq \mathbf{E}_{L}^\mathrm{filt}. \end{align} \]
  • The effectiveness of the linear-Gaussian approximation strongly depends on the length of the forecast window \( \Delta t \);
    • for small \( \Delta t \), the densities are well-approximated with Gaussians, yet there are deformations induced due to nonlinearity.
  • When the dynamics are weakly nonlinear, correlations between the model forecast and the initial condition bring new information into the forecast states in the next DAW.
  • This has been exploited to a great extent by utilizing the 4D cost function;
    • the filtering MAP cost function is extended over multiple observations simultaneously, and in terms of a lagged state directly.

The 4D cost function

  • Suppose now we want to write \( p(\pmb{x}_{L:1}\vert \pmb{y}_{L:1}) \), the joint smoothing posterior over the current DAW, as a recursive update to the last smoothing posterior.
Diagram of the filter observation-analysis-forecast cycle.
  • Using a Bayesian analysis like before, we can write \[ \begin{align} {\color{#d95f02} { p(\pmb{x}_{L:1} \vert \pmb{y}_{L:1}) } } &\propto \int \mathrm{d}\pmb{x}_0 \underbrace{ {\color{#d95f02} { p(\pmb{x}_0 \vert \pmb{y}_{L-S:-S}) } } }_{(1)} \underbrace{ {\color{#7570b3} { \left[ \prod_{k=L-S+1}^L p(\pmb{y}_k \vert \pmb{x}_k) \right] } }}_{(2)} \underbrace{{\color{#1b9e77} { \left[\prod_{k=1}^L p(\pmb{x}_k|\pmb{x}_{k-1}) \right] } }}_{(3)} \end{align} \] where
    1. is the marginal for \( \pmb{x}_0 \) of the last joint smoothing smoothing posterior \( p(\pmb{x}_{L-S:-S}\vert\pmb{y}_{L-S:-S}) \);
    2. is the joint likelihood of the incoming observations to the current DAW, given the background forecast;
    3. is the free-forecast with the perfect model \( \mathcal{M}_k \).

The 4D cost function

  • Under the linear-Gaussian assumption, the resulting cost function takes the form \[ \begin{alignat}{2} & & {\color{#d95f02} {\mathcal{J} (\pmb{w})} } &= {\color{#d95f02} {\frac{1}{2} \parallel \overline{\pmb{x}}_{0|L-S}^\mathrm{smth} - \boldsymbol{\Sigma}^\mathrm{smth}_{0|L-S} \pmb{w}- \overline{\pmb{x}}^\mathrm{smth}_{0|L-S} \parallel_{\mathbf{B}^\mathrm{smth}_{0|L-S}}^2} } + {\color{#7570b3} {\sum_{k=L-S+1}^L \frac{1}{2} \parallel \pmb{y}_k - \mathbf{H}_k {\color{#1b9e77} { \mathbf{M}_{k:1}} } {\color{#d95f02} {\overline{\pmb{x}}^\mathrm{smth}_{0|L-S}} } - \mathbf{H}_k {\color{#1b9e77} {\mathbf{M}_{k:1}}} {\color{#d95f02} {\boldsymbol{\Sigma}^\mathrm{smth}_{0|L-S} \pmb{w} }} \parallel_{\mathbf{R}_k}^2 } } \\ \Leftrightarrow& & \mathcal{J}(\pmb{w}) &= \frac{1}{2}\parallel\pmb{w}\parallel^2 + \sum_{k=L-S+1}^L \frac{1}{2}\parallel \overline{\pmb{\delta}}_k - \boldsymbol{\Gamma}_k \pmb{w}\parallel^2 . \end{alignat} \]
  • In the linear-Gaussian case, the solution can again be found by a single iteration of Newton’s descent \[ \begin{align} \nabla_{\pmb{w}} \mathcal{J}:= \pmb{w} - \sum_{k=L-S+1}^{L}\boldsymbol{\Gamma}^\top_k \left(\overline{\pmb{\delta}}_k - \boldsymbol{\Gamma}_k \pmb{w}\right), & & \mathbf{H}_{\mathcal{J}} := \mathbf{I} + \sum_{k=L-S+1}^L \boldsymbol{\Gamma}_k^\top \boldsymbol{\Gamma}_k, & & \overline{\pmb{w}} := \pmb{0} - \mathbf{H}_{\mathcal{J}}^{-1} \nabla_{\pmb{w}} \mathcal{J}|_{\pmb{w}=\pmb{0}}\\ \overline{\pmb{x}}_{0|L}^\mathrm{smth} = \overline{\pmb{x}}^\mathrm{smth}_{0|L-S} + \boldsymbol{\Sigma}_{0|L-S}^\mathrm{smth} \overline{\pmb{w}} & & \mathbf{T}:= \mathbf{H}_{\mathcal{J}}^{-\frac{1}{2}} & &\boldsymbol{\Sigma}_{0|L}^\mathrm{smth}:= \boldsymbol{\Sigma}_{0|L-S}^\mathrm{smth}\mathbf{T} \end{align} \]
  • However, when the state and observation model are nonlinear, using \( \mathcal{H}_k \) and \( \mathcal{M}_{k:1} \) in the cost function, this cost function is solved iteratively to find a local minimum.
  • The difficulty arises in that the gradient \( \nabla_{\pmb{w}} \) actually requires differentiating the equations of motion in \( \mathcal{H}_k\circ\mathcal{M}_{k:1} \).
  • In 4D-VAR, this is performed by an incremental linearization in the tangent linear model and back propagation of sensitivities by the adjoint model;
    • this can also be hybridized with an ensemble in various forms of ensemble-variational (EnVAR) techniques.10
10. Bannister, R. N. (2017). A review of operational methods of variational and ensemble‐variational data assimilation. Quarterly Journal of the Royal Meteorological Society, 143(703), 607-633.

Hybrid EnVAR in the EnKF analysis

  • One alternative to constructing the tangent-linear and adjoint models is to perform a hybrid, analysis as based on the ETKF.

  • This approach is at the basis of the iterative ensemble Kalman filter / smoother (IEnKF/S).11

  • This technique seeks to perform an ensemble analysis like the square root ETKF by defining the ensemble estimates and the weight vector directly in the ensemble span

    \[ \begin{alignat}{2} & & {\color{#d95f02} {\widetilde{\mathcal{J}} (\pmb{w})} } &= {\color{#d95f02} {\frac{1}{2} \parallel \hat{\pmb{x}}_{0|L-S}^\mathrm{smth} - \mathbf{X}^\mathrm{smth}_{0|L-S} \pmb{w}- \hat{\pmb{x}}^\mathrm{smth}_{0|L-S} \parallel_{\mathbf{P}^\mathrm{smth}_{0|L-S}}^2} } + {\color{#7570b3} {\sum_{k=L-S+1}^L \frac{1}{2} \parallel \pmb{y}_k - \mathcal{H}_k\circ {\color{#1b9e77} { \mathcal{M}_{k:1}\left( {\color{#d95f02} { \hat{\pmb{x}}^\mathrm{smth}_{0|L-S} + \mathbf{X}^\mathrm{smth}_{0|L-S} \pmb{w} } } \right)}}\parallel_{\mathbf{R}_k}^2 } }\\ \Leftrightarrow & & {\color{#d95f02} {\widetilde{\mathcal{J}} (\pmb{w})} } &= {\color{#d95f02} {(N_e - 1) \frac{1}{2} \parallel \pmb{w}\parallel^2} } + {\color{#7570b3} {\sum_{k=L-S+1}^L \frac{1}{2} \parallel \pmb{y}_k - \mathcal{H}_k\circ {\color{#1b9e77} { \mathcal{M}_{k:1}\left( {\color{#d95f02} { \hat{\pmb{x}}^\mathrm{smth}_{0|L-S} + \mathbf{X}^\mathrm{smth}_{0|L-S} \pmb{w} } } \right)}}\parallel_{\mathbf{R}_k}^2 } }. \end{alignat} \]

  • The scheme produces an iterative estimate using a Gauss-Newton-11 or, e.g., Levenberg-Marquardt-based12 optimization.

11. Bocquet, M., & Sakov, P. (2014). An iterative ensemble Kalman smoother. Quarterly Journal of the Royal Meteorological Society, 140(682), 1521-1535.
12. Bocquet, M., & Sakov, P. (2012). Combining inflation-free and iterative ensemble Kalman filters for strongly nonlinear systems. NPG, 19(3), 383-399.

The single iteration ensemble transform Kalman smoother (SIEnKS)

  • Accuracy can increase with iterations in the 4D estimate, at the cost of the state ensemble forecast \( {\color{#1b9e77} { \mathcal{M}_{L:1} } } \).
  • In short-range forecasting the linear-Gaussian approximation of the evolution of the densities is actually an adequate approximation;
    • iterating over the nonlinear dynamics may not be justified by the improvement in the forecast statistics.
  • However, the iterative optimization over a nonlinear observation operator \( \mathcal{H}_k \) or hyper-parameters in the filtering step of the classical EnKS can be run without the additional cost of model forecasts.
    • This can be performed similarly to the IEnKS with the maximum likelihood ensemble filter (MELF) analysis.13
  • Subsequently, the retrospective, right-transform analysis can be applied to condition the initial ensemble \[ \begin{align} \mathbf{E}^\mathrm{smth}_{0|L} = \mathbf{E}_{0:L-1}^\mathrm{smth} \boldsymbol{\Psi}_L \end{align} \]
  • As with the 4D cost function, one can initialize the next DA cycle in terms of the retrospective analysis, and gain the benefit of the improved initial estimate.
  • This scheme, currently in open review, is the single-iteration ensemble Kalman smoother (SIEnKS).14
13. Zupanski, M., et al. (2008). The Maximum Likelihood Ensemble Filter as a non-differentiable minimization algorithm. Quarterly Journal of the Royal Meteorological Society, 134, 1039–1050.
14. Grudzien, C. and Bocquet, M. (2021): A fast, single-iteration ensemble Kalman smoother for sequential data assimilation. GMD Discussions [preprint], https://doi.org/10.5194/gmd-2021-306, in open review.

The single iteration ensemble Kalman smoother (SIEnKS)

  • Compared to the classical EnKS, this adds an outer loop to the filtering cycle to produce the posterior analysis.
Diagram of the filter observation-analysis-forecast cycle.
  • The information flows from the filtered state back in time from the retrospective analysis.
  • This re-analyzed state becomes the initialization for the next cycle over the shifted DAW, carrying this information forward.
  • The iterative cost function is only solved in the filtering estimate for the new observations entering the DAW.
    • Combined with the retrospective analysis, this comes without the cost of iterating the model forecast over the DAW.
  • For short-range forecasting, this is shown to be an accurate and highly efficient approach to sequential smoothing.

The single iteration ensemble Kalman smoother (SIEnKS)

  • Our key result is an efficient multiple data assimilation (MDA)15 scheme within the EnKS cycle.
    • MDA is a technique based on statistical tempering16 designed to relax the nonlinearity of the Bayesian MAP estimation.
  • In a single data assimilation (SDA) smoother, each observation is only assimilated once so that new observations are only distantly connected to the initial conditions of the simulation;
    • this can introduce many local minima to the 4D cost function, strongly affecting the optimization.17
  • MDA is designed to artificially inflate observation errors and weakly assimilate the same observation over multiple DAWs.
    • This weakens the effects of local minima, and slowly brings the estimator close to a more optimal solution.18
  • However, the SIEnKS treats MDA differently than 4D-EnVAR estimators by using a classic EnKS cycle to weakly assimilate the observations over multiple passes.
    • The filter step in this analysis is used as a boundary condition for the interpolation of the posterior over the lag window.
  • This MDA scheme is demonstrated to be more accurate, stable and cost-effective than EnKF-based 4D-EnVAR schemes in short-range forecasts.
    • Note, the SIEnKS is not as robust as 4D estimates in highly nonlinear forecast dynamics.
15. Emerick, A. A., & Reynolds, A. C. (2013). Ensemble smoother with multiple data assimilation. Computers & Geosciences, 55, 3-15.
16. Neal, R. M. (1996). Sampling from multimodal distributions using tempered transitions. Statistics and computing, 6(4), 353-366.
17. Fillion, A., Bocquet, M., and Gratton, S. (2018). Quasi-static ensemble variational data assimilation: a theoretical and numerical study with the iterative ensemble Kalman smoother, NPG, 25, 315–334.
18. Evensen, G.: Analysis of iterative ensemble smoothers for solving inverse problems, Computational Geosciences, 22, 885–908, 2018.

The single iteration ensemble Kalman smoother (SIEnKS)

Ensemble statistics
Ensemble statistics

The single iteration ensemble Kalman smoother (SIEnKS)

Ensemble statistics
Ensemble statistics

  • The data boundary condition improves the forecast statistics, controlling the accumulated forecast error over lagged states unlike traditional 4D-EnVAR approaches.
  • Similarly, the interpolation of the posterior estimate remains more stable over the DAW, when the forecast error dynamics are not highly nonlinear.

The single iteration ensemble Kalman smoother (SIEnKS)

  • A wide variety of test cases for short-range forecast systems are presented in our manuscript19.
    • We demonstrate improved accuracy at lower leading order cost of the SIEnKS with highly nonlinear observation operators, hyper-parameter optimization and other relevant tests.
  • The two qualifications are that:
    1. these theoretical results are based on the perfect model assumption for simplicity in this initial analysis; and
    2. we have not yet introduced localization or covariance hybridization in this initial study for simplicity.
  • However, localization / hybridization are likely easy extensions based on the forecast regime that we target.
  • Initial results in model error support the case that the SIEnKS can be modified for this regime.
  • Our mathematical results are supported by extensive numerical demonstration, with the Julia package DataAssimilationBenchmarks.jl20
  • A wider survey of methods for Bayesian DA in the geosciences is available in my condensed lecture notes21 which will be expanded upon in a longer-term book project.22
19. Grudzien, C. and Bocquet, M. (2021): A fast, single-iteration ensemble Kalman smoother for sequential data assimilation. GMD Discussions [preprint], https://doi.org/10.5194/gmd-2021-306, in open review.
20. Grudzien, C., Sandhu, S. (2021). DataAssimilationBenchmarks.jl: a data assimilation research framework. In submission to The Journal of Open Source Software.
21. Grudzien, C. and Bocquet, M. (2021). A Tutorial on Bayesian Data Assimilation. Invited submission to Applications of Data Assimilation and Inverse Problems in the Earth Sciences. Cambridge University Press.
22. Carrassi, A., Grudzien, C., Bocquet, M., Farchi, A., Raanes, P. Data assimilation for dynamical system and their discovery through machine learning. Accepted Book Proposal in Springer-Nature. Target submission in 2023.